-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update MuJoCo to 3.X #493
Update MuJoCo to 3.X #493
Conversation
I initially got it working and visually verified that the arm position looked like MuJoCo 2 using |
Does this update break the environments with besides, visual validation of the rendered arm, has there been done any other validation, can you collect a trajectory of 1k steps with the old version of metaworld and mujoco and check how it compares with the new version of metaworld and mujoco (with the same action) Thanks |
With regards to validation, I did not run much before opening this pull request as I was requested by another maintainer to open it as soon as possible to run some RL experiments against it and verify that algorithm performance is not impacted. That said I ran some comparisons today.
MP4 videosreach-v2_mujoco2.mp4reach-v2_mujoco3.mp4Web videosreach-v2_mujoco2.webmreach-v2_mujoco3.webmIn case this is not satisfactory, the mai thing I can think of to better match the old behaviour is to perhaps run a hyperparameter tuner on the weld constraint parameters using these losses against the mujoco 2 / old weld parameter trajectory, which I might do anyway out of curiosity now that I have this set up. Also validation RL experiments are still planned to check for behavioural similarity (which in my opinion is what matters more than the actual error between observation values). |
I ran some hyper parameter tuning and it looks like with torquescale set to around 8.2 we can get an even closer match
|
|
There's a slight degradation in performance when training a multi-task algorithm. The simulation is more accurate with Mujoco >= 3 which is likely why. We're digging into it to make sure that's the case
Yes, this will break for Mujoco <= 2.3.7. We do not want to support Mujoco <= 2.3.7, so I agree that a version bump is necessary. We also are working on getting the docs and a few other things ready for a polished V3 release so I think this can be included in that release. |
I don't think it matters what environment it's fine-tuned for as long as the difference in rewards isn't used as the objective to minimise as the parameters here only affect the robot arm itself and how it responds to actions randomly sampled uniformly from its action space. That said the difference is infinitesimal to begin with and as is shown in the tuning report the difference for values between 5 and 9 is almost within margin of error, so I don't think the actual value matters as long as it's within that range. I will say anecdotally I have never seen tests fail for torquescale=5 (my initial value) but I have seen them fail (at 0.39/0.5 or 0.4/0.5 like in this latest CI run) for 8.22 which is what the tuner found. I might revert it back to 5. Also I definitely agree that there is no need to match mujoco==2.3.7 down to arbitrary precision, but I thought it would be interesting to try.
The entire rotational part of the weld constraint is ignored in mujoco==2.3.7 due to a bug so any changes in those values actually shouldn't matter. To verify, I ran the tests with the quat being (-1, 0, 0, 0) and torquescale=1 with mujoco==2.3.7 and they pass. But as @reginald-mclean said, I don't think supporting mujoco<=2.3.7 is desirable anyway and a version bump for supporting MuJoCo 3 would make the most sense. Also I have finished running a multi-task algorithm (MTMHSAC) on the MT10 benchmark for 3 seeds and the results seem in line with previous ones obtained on MuJoCo 2.3.7. An MT50 run is still planned but I am currently facing some technical difficulties getting experiments running on my university's compute cluster, which is the only compute I have access to that could tractably handle MT50 at the moment. |
@rainx0r have we confirmed that everything looks good from a performance standpoint using torquescale 5? |
I'd say so. I've got MT50 runs done more or less (one of the seeds got cut off at 96M timesteps and my compute is out of commission at the moment so I can't resume it) which can be seen on this report. MT10 runs can be found on the report posted previously for torquescale 8.2, and for torquescale 5 there are some newer runs here which seem to replicate the 8.2 results. Since MT10 and MT50 runs seem to work and reproduce existing results then this should indicate that the benchmark does indeed work on MuJoCo 3. Also below are some comments on the MuJoCo 3-related settings after I've done some more thinking and research into them. tl;dr I think relpose quat could be reverted to (1,0,0,0) but it doesn't really matter, and torquescale should stay at 5 even though it also doesn't seem to matter much, but I'm open to suggestions for ways to find a better value if anyone has any. P.S. 1: On the torquescale valueI mostly switched from torquescale 8.2 to 5 in an attempt to fix my failing MT50 runs but it seems like the problem was elsewhere. After debugging my algorithm implementation a bit more I was also able to more or less replicate the successful torquescale=5 MT50 runs on torquescale=8.2 on local compute for 1 seed. So overall the value doesn't really seem to matter to the RL algorithm. My opinion would be to stick with 5 for simplicity, unless there are any suggestions for how to potentially get a better value. The optimisation metric I used for my previous experiment that found the value of 8.2 was very noisy so I don't have much faith in that value especially as there's no clear trend in the hyper parameter sweep. The distance between the observations across the two versions being so small as to be noisy as well as the two values not mattering to the RL algorithm should indicate that the value ultimately isn't that important past a certain point (roughly 4.0) though, unless I'm missing something. P.S. 2: On the relpose quatAs for the relpose quat value, it's important that it's not (0,0,0,0)--which sets relpose based on qpos0. Then both (1,0,0,0) and (-1,0,0,0) represent the same rotation, so both should work and indeed at the same torquescale both have tests pass. I used (-1,0,0,0) when I was trying out different things to get MuJoCo 3 working (and frankly I hadn't realised it was the same (lack of) rotation as (1,0,0,0) as I only read more into quaternions later; I'm sadly not an engineering major nor a game dev). We could revert it to (1,0,0,0) for simplicity and to avoid changing things when not necessary, or keep it as is since it does represent the same rotation after all and we have confirmed that (-1,0,0,0) works for RL. Personally I think it shouldn't matter. |
@rainx0r thank you for the update and the hard work! Results look good to me |
cca35cf
into
Farama-Foundation:master
A key part missing from recent Metaworld renovation work is to stop depending on MuJoCo 2.X. Upgrading to MuJoCo 3.X opens up useful features such as:
The main reason for not using MuJoCo 3.X before was that the benchmark did not work as intended and appeared to be broken (i.e. tests would fail).
Over the past few months, I did a deep dive on why this is and how to fix it. You can find a relatively in-depth documentation of the process and the findings in this issue on the MuJoCo repo.
In the end I had to make a single change to get it working:
1.0
to-1.0
achieves, which corresponds to the first part of the quaternion in the rotational part of the relpose attribute of the weld constraint), and also to weigh the rotational part of the constraint quite high (set torquescale, the last part of the constraint data, to 3.75 from 1.0)(For an explanation as to why this change was necessary please see the discussion on the this issue in the MuJoCo repo)
I tried different values for torquecale but found that at 3.5 soccer breaks, and at 3.75 and up everything works.